An alpha-helical lid guides the target DNA toward catalysis in CRISPR-Cas12a

CRISPR-Cas12a is a powerful RNA-guided genome-editing system that generates double-strand DNA breaks using its single RuvC nuclease domain by a sequential mechanism in which initial cleavage of the non-target strand is followed by target strand cleavage. How the spatially distant DNA target strand traverses toward the RuvC catalytic core is presently not understood. Here, continuous tens of microsecond-long molecular dynamics and free-energy simulations reveal that an α-helical lid, located within the RuvC domain, plays a pivotal role in the traversal of the DNA target strand by anchoring the crRNA:target strand duplex and guiding the target strand toward the RuvC core, as also corroborated by DNA cleavage experiments. In this mechanism, the REC2 domain pushes the crRNA:target strand duplex toward the core of the enzyme, while the Nuc domain aids the bending and accommodation of the target strand within the RuvC core by bending inward. Understanding of this critical process underlying Cas12a activity will enrich fundamental knowledge and facilitate further engineering strategies for genome editing.

the DED motif in RuvC (Supplementary Fig. 2b).The two catalytic Mg 2+ ions were located in the RuvC site of Cas12a by superimposing the DDE motif of the two structures (6GTG [https://doi.org/10.2210/pdb6GTG/pdb](Cas12a -I4 Conformation) and 6LTU [https://doi.org/10.2210/pdb6LTU/pdb](Cas12i2 ternary complex)).Finally, the complete Cas12a system was subjected to force field-based vacuum minimization to obtain an energetically stable complex, further used for long timescale MD simulations.Importantly, distance analyses of the DDE motif coordinating with the Mg 2+ ions along ~10 µs-long MD simulation (Supplementary Fig. 2c) underscores the stability of the two-metal ion architecture in the system.
The catalytically competent state of Cas12a for TS cleavage was built taking cues from related crystal structures of Cas12a orthologues, holding the DNA TS within the RuvC active site.
The X-ray structure of the CRISPR-Cas12b (PDB: 5U30 [https://doi.org/10.2210/pdb5U30/pdb](AacC2c1-sgRNA-extended target DNA ternary complex), 11 at 2.9 Å resolution) accommodates the TS within the RuvC catalytic cleft (Supplementary Fig. 8a).The TS bends down from the crRNA:TS duplex toward the active site, forming a U-shaped strand with its 5'-tail interacting with the Nuc domain.Cas12a and Cas12b share a highly conserved RuvC catalytic domain, which is superposable in the 6GTG [https://doi.org/10.2210/pdb6GTG/pdb](Cas12a -I4 Conformation) and 5U30 [https://doi.org/10.2210/pdb5U30/pdb](AacC2c1-sgRNA-extended target DNA ternary complex) structures (Supplementary Fig. 8b).A comparison of the two structures shows that the nucleic acids are also superposable and differ for a longer U-shaped TS that locates with the RuvC active site in Cas12b.Building on this similarity, the 5U30 [https://doi.org/10.2210/pdb5U30/pdb](AacC2c1-sgRNA-extended target DNA ternary complex) structure was used as a template for biased MD simulations, starting from the complete Cas12a system that includes Mg 2+ ions in the RuvC site.Restraint forces were applied to reduce the RMSD between initial and final conformations of the DNA nucleobases -1 to -5, which include the scissile phosphate, similar to a prior modeled structure of Cas12a. 2 The following potential was applied: where k is the collective force constant, N is the number of targeted atoms, RMS(t) is the actual  between the current (at time t, initially) and target configurations, and RMS*(t) is reduced with a constant velocity from the initial  value between the initial and target configurations toward the desired final  (i.e., zero).The difference in RMSD was gradually reduced in 30 steps of 20 ns each with a force constant k of 30 kcal mol -1 Å -2 , collecting an overall ensemble of ~600 ns.This protocol leads to a relaxed trajectory, allowing the protein to adapt and accommodate the transition.The resultant structure contains the scissile phosphate, between the nucleobases T-2 and G-3, which coordinates the catalytic Mg 2+ ions (Supplementary Fig. 8c,d).
This complex was subjected to ~10 µs of MD simulation for further refinement and to ensure long timescale stability (Supplementary Fig. 9).Along the simulations, the active site is stable (Supplementary Fig. 9c) and the DDE motif steadily coordinates Mg 2+ (Supplementary Fig. 9d).
All the systems were embedded in explicit waters, while counterions were added to neutralize the total charge at physiological conditions, leading to periodic simulation cells of ~138*149*167 Å 3 and a total of ~307,000 atoms for each system.
Molecular Dynamics (MD) simulations.Conventional MD simulations were performed to equilibrate the systems prior to long timescale MD on Anton-2. 12A simulation protocol tailored for RNA/DNA nucleases was adopted, embracing the use of the Amber ff19SB 13 force field, which includes the ff99bsc1 corrections for DNA 14 and the ff99bsc0+χOL3 corrections for RNA. 15,16The TIP3P model was employed for explicit water molecules. 17The Li & Merz 12-6 model of nonbonded interactions was used for Mg 2+ ions. 18We have extensively employed these force field models in computational studies of CRISPR-Cas systems, 19 showing also that they perform well for long timescale simulations on Anton-2. 20The Li & Merz model also reported a good description of Mg 2+ bound sites, in agreement with quantum/classical simulations. 21An integration time step of 2 fs was employed.All bond lengths involving hydrogen atoms were constrained using the SHAKE algorithm.Temperature control (300 K) was performed via Langevin dynamics, 22 with a collision frequency γ = 1.Pressure control was accomplished by coupling the system to a Berendsen barostat 23 at a reference pressure of 1 atm and with a relaxation time of 2 ps.The systems were subjected to energy minimization to relax water molecules and counter ions, keeping the protein, the RNA, DNA and Mg 2+ ions fixed with harmonic position restraints of 300 kcal/mol • Å 2 .Then, the systems were heated up from 0 to 100 K in a canonical ensemble (NVT), by running two simulations of 5 ps each, imposing position restraints of 100 kcal/mol • Å 2 on the above-mentioned elements of the system.The temperature was further increased up to 200 K in ~100 ps of MD in the isothermal-isobaric ensemble (NPT), reducing the restraint to 25 kcal/mol • Å 2 .Subsequently, all restraints were released, and the temperature of the systems was raised up to 300 K in a single NPT simulation of 500 ps.After ~ 1.1 ns of equilibration, ~10 ns of NPT runs were carried out allowing the density of the systems to stabilize around 1.01 g cm -3 .Finally, production runs were carried out in the NVT ensemble, collecting ~120 ns.These simulations have been performed using the GPU-empowered version of AMBER 20. 24 The well-equilibrated systems were used as starting points for simulations on Anton-2. 12ong timescale MD simulations of the CRISPR-Cas12a systems were performed using Anton-2, 12 a special-purpose supercomputer for micro-to-millisecond length MD simulations, starting from well-equilibrated configurations, obtained after ~120 ns of conventional MD (vide supra).Simulations on Anton-2 were performed using the same force field parameters used for conventional MD simulations.A reversible multiple time step algorithm 25 was employed to integrate the equations of motion with a time step of 2 fs for short-range nonbonded and bonded forces and 6 fs for the long-range nonbonded forces.Simulations were performed in the NPT ensemble using the multigrator integrator as implemented in Anton-2. 26Pressure control was accomplished via the Martyna, Tobias, Klein (MTK) barostat, 27 set to maintain 1 bar of pressure, with a tau (piston time constant) parameter of 0.0416667 ps and reference temperature of 310.15 K.The barostat period was set to the default value of 480 ps per timestep.Temperature control was accomplished via the Nosé-Hoover thermostat 28,29 with a tau (time constant) parameter of 0.0416667 ps.The k-Gaussian split Ewald method 30 was used for long-range electrostatic interactions.Hydrogen atoms were added assuming standard bond lengths and were constrained to their equilibrium position with the SHAKE algorithm. 31By using this approach and the Anton-2 supercomputer, 12 each model system was simulated for ~10 μs and in replicates.Principal Component Analysis.PCA is a statistical method used to reduce a large number of degrees of freedom to an essential subspace set, which reports on the large-amplitude, collective motions in biological molecules undergoing MD simulations.In PCA, the covariance matrix of the protein Cα atoms is calculated and diagonalized to obtain a new set of coordinates (eigenvectors) to describe the system motions.Each eigenvector -also called Principal Component (PC) -is associated with an eigenvalue corresponding to the mean square fluctuation contained in the system's trajectory projected along that eigenvector.By sorting the eigenvectors according to their eigenvalues, the first PC1 corresponds to the system's largest amplitude motion, and the dynamics of the system along PC1 are usually referred "essential dynamics". 32,33e, PCA has been performed considering the simulations of the complete Cas12a after NTS cleavage (Fig. 1c, Supplementary Table 1).First, the collective ensembles were combined and subjected to RMS-fit to the same reference configuration, removing rotational and translational motions, to ensure a consistent eigenbasis on all compared systems.Then, the principal motions of the protein have been captured starting from the mass-weighted covariance matrix of the Cα atoms.Each element in this matrix is the covariance between atoms  and , defining the ,  position of the covariance matrix ( %& ): where  ⃗ % and  ⃗ & are the position vectors of atoms  and , and the brackets denote an average over the sampled time period.A positive sign of this product indicates that the two atoms move in a correlated manner, otherwise, a negative value points to anti-correlated atoms.If the product is zero, then it evinces that the atoms' displacements are independent of each other.The covariance matrix was then diagonalized, leading to a complete set of orthogonal collective eigenvectors, each associated with a corresponding eigenvalue.The eigenvalues denote how much each eigenvector is representative of the system dynamics, thus giving a measure of the contribution of each eigenvector to the total variance.Indeed, the eigenvectors with the largest eigenvalues correspond to the most relevant motions.By projecting the displacement vectors of each atom along the trajectory onto the eigenvectors (i.e., by taking the dot product between the two vectors at each frame), the PCs were obtained.The first PC1, commonly referred to as "essential dynamics", 32,33 is plotted on the 3D structure of the complete CRISPR-Cas12a system indicating the largest amplitude motions of the protein (Supplementary Fig. 6e).PCA has been performed using cpptraj of AMBER 20, 24 while the Normal Mode Wizard 34 plugin of the Visual Molecular Dynamics 35 program has been used for the graphical rendering.
Umbrella Sampling simulations.The umbrella sampling (US) method 36 was used to compute the free energy profile associated with the traversal of the DNA TS toward the RuvC active site.In this method, several simulations (US windows) are run in parallel with additional harmonic bias potential applied to selected Reaction Coordinates (): where () is the value of the bias potential,  is a bias force constant, () is the value of  at given time  and * is the reference value of .By using different * values in each US window, one can sample the biased probability distribution  ' () along the whole  range of interest.Thanks to a simple reweighting scheme providing the unbiased probabilities, the free energy of a process along the  can be computed (vide infra).
We performed two-dimensional Umbrella Sampling (2-D US) simulations at 300 K using two reaction coordinates ().1 was the difference in root-mean-square deviation () of all the C⍺ atoms between the initial (Fig. 1c) and the final (Supplementary Fig. 8d) states.2 was the distance between the centre of mass of the RuvC catalytic core (D917, E1006, D1255 coordinating with the two catalytic Mg 2+ ions) and of the TS region including the scissile phosphate (i.e., the DNA nucleobases at positions -2 to -4, as indicated by biochemical and single-molecule experiments). 5,37Restraint forces were applied on the two  as follows.The difference in  between initial and final conformations was reduced, according to the following potential: where k is the collective force constant, N is the number of targeted atoms, RMS(t) is the actual  between the current (at time t, initially) and target configurations, and RMS*(t) is reduced with a constant velocity from the initial  value (~6 Å) between the initial and target configurations toward the desired final  (i.e., zero).1 was divided into bin sizes of 0.2 Å  difference, and each window was run for ~10 ns with a collective force constant k of 50 kcal mol -1 Å -2 .Additionally, restraint forces were also applied to reduce the 2 from ~30 Å to ~6.4 Å, according to the same equation.2 was discretized into bin sizes of 0.2 Å and individual windows were run for ~15 ns with a collective force constant k of 50 kcal mol -1 Å -2 .Overall, we simulated 28 (1) * 119 (2) = 3,332 windows, resulting in a collective ensemble of ~50 µs simulations.
Two independent sets of 2-D US simulations were performed: (i) for the wild-type system and (ii) upon mutating relevant residues of the ⍺-helical lid (N1009, F1012, K1013, R1014, R1016, K1018) into alanine.Then, the free energy profiles were computed using the Weighted Histogram Analysis (WHAM) method 38 upon removing the initial 1/3 rd of the US trajectories from each window, corresponding to the relaxation of the system.The unbiased probability distribution ( ( ()) was recovered from the biased probability distribution ( ' ()) through the formula: ( () =  ' () *  ( where () is the value of the applied potential at given  in a given US window,  1 is the Boltzmann constant,  is the reference temperature,  corresponds to an expected value of  )(+,) for each US window and is optimized with the WHAM algorithm. 38Analysis of the conformational ensembles was performed on the reweighted trajectories using the exponential factor  ( Analysis of the free energy landscape was performed by computing the minimum free energy pathway using an approach similar to the Dijkstra algorithm. 39In detail, the map is sampled from the origin point, propagating to the lowest energy point available on each iteration until it reaches the target point.This enabled us to consider both the RCs along a minimum free energy path.The convergence of the free energy profiles was evaluated by plotting the minimum free energy pathways, considering the remaining 10 ns runs from each US window, for ~5 ns through ~10 ns (Supplementary Fig. 12).The convergence is reached ~8 ns onward for both systems and hence we used bin sizes of ~10 ns to compute our 2-D free energy surfaces (Supplementary Fig. 11) and the minimum free energy paths (Fig. 4 and Supplementary Fig. 11).The error estimation on the minimum free energy pathways was performed using the Monte Carlo bootstrap error analysis. 40culation of bending angles.Conformational changes of the protein and nucleic acids were monitored through four critical bending angles describing: (i) crRNA:TS duplex arching, (ii) Rec2 bending and (iii) Nuc bending, (iv) TS bending.In cases (i), (ii) and (iii), we computed the angle between two vectors passing through two regions of interest (defined as mask 1 and mask 2, Supplementary Table 2).We generated a plane based on the first frame of MD production runs by fitting the Cα atoms of the residues under selection (or P atoms for the nucleobases) with the smallest mean square error.A vector ( E⃗ ) was generated through the centre of mass of two critical points of the selection relevant to the conformational change of interest (viz., mask1 to mask2, Supplementary Table 2) on the plane.Another vector ( ⟂ EEEE⃗ ), perpendicular to  E⃗ , out of the plane was generated from the frame of reference.The bending angles were studied by calculating the angle of the vector ( E⃗ ) along the trajectory with respect to  ⟂ EEEE⃗ .For the calculation of (iv) TS bending, we calculated the angle between two vectors ( 2 EEE⃗ and  " EEE⃗ ) passing through the centre of mass of two critical points of the selection relevant to the conformational change of interest (viz., mask1 to mask2, Supplementary Table 2).
Alchemical free energy simulations.We computed the relative free energy changes (i.e., ) of the individual residues with respect to alanine in binding the DNA target strand (TS), by performing alchemical free energy simulations. 41Here, the DNA target strand (TS, ligand) bound to the Cas12a complex (receptor) through the α-helical lid was decoupled to an unbound state following the alchemical free energy formalism using softcore (SC) potentials, 42 by gradually diminishing the electrostatic and van der Waals interactions of the TS (A14, C2, G1 and G-1) bases that interact with the lid.We used a lambda (λ) dynamic variable to define the thermodynamic states of the system along this decoupling alchemical pathway.Thermodynamic integration (TI) simulations were performed at 21 λ values from 0 to 1, at an interval of 0.05, for the wild-type (WT) CRISPR-Cas12a and for each individual alanine mutant.The systems were minimized and equilibrated using the simulation protocol described above.The final binding free energies were computed using the Multistate Bennett Acceptance Ratio (MBAR) method 43,44 to integrate the free energies over the different lambda values. 45,46The relative binding free energy () was calculated as follows: In detail, the pairwise interactions of the DNA TS region (A14, C2, G1 and G-1) responsible for binding the lid (L1008 to K1021) was "softened" by modifying the Lenard-Jones () softcore potentials for van der Waal interactions and Coulombic () softcore potentials for electrostatic interactions as follows: and  ,>(? =, % %& ; -=  ,>(? [ %& ,>(? (; )] where,  and  are adjustable positive semidefinite parameters for the  and  softcore interactions, respectively, with values of zero corresponding to no softcore modification for λ value.The long-range interaction cut-off, temperature, and pressure were set as described above for Molecular Dynamics simulations.During TI, the potentials were calculated "on the fly", i.e., as the simulation evolved through the different values of the λ parameter that progressively removed the non-bonded interactions of the ligand (i.e., the TS) using softcore potentials to transition from a bound to an unbound state.
For each system (i.e., WT CRISPR-Cas12a, and the N1009A, F1012A, K1013A, R1014A, R1016A, K1018A mutants), we considered two states A and B based on the reaction coordinate values along the minimum free energy path of TS traversal (Fig. 4a).Alchemical simulations were performed at 21 values of λ (from 0 to 1, in steps of 0.05), reaching ~30 ns for each window.
Convergence of the  for each system is reported in Supplementary Fig. 16.Considering all the systems and states investigated here, a total of ~8.8 μs sampling was reached.
The FnCas12a gene was inserted into the 1B plasmid (Addgene #29653) using ligationindependent cloning (LIC), resulting in a construct carrying an N-terminal hexahistidine tag followed by a tobacco etch virus (TEV) protease cleavage site.Point mutations were introduced by Gibson assembly of the PCR-amplified vector backbone with gBlock Gene Fragments (IDT) encoding the individual mutations.The sequences of the synthetic gene, primer and gBlocks are listed in Supplementary Data 1.Mutant FnCas12a constructs were purified as for wild type.
Purification of Cas12a was done as described. 47,48In brief, Cas12a constructs were expressed in Fluorescence of the ATTO532-labeled substrate and cleavage products was detected using a Typhoon FLA 9500 gel imager; the cleavage rate was quantified based on loss of uncleaved substrate DNA using ImageQuant TL v.8.2.0.Sequences of RNA and DNA oligos utilized in the nuclease assays are listed in Supplementary Table 3.
actual data points, while the solid lines show the running average over 5 ns windows of MD 367 simulation.368 1 min, 5 min, 15 min and 60 min.Data represents n = 3 independent replicates.Replica 1 has 561 been labelled with an asterisk sign to indicate the replica shown in Supplementary Fig. 13.562 Specifically, we obtained multiple trajectories of the 6GTG [https://doi.org/10.2210/pdb6GTG/pdb](Cas12a -I4 Conformation) cryo-EM structure (simulated for ~10 μs) and its more complete model including a //doi.org/10.2210/pdb5NFV/pdb](FnCas12a bound to R-loop) X-ray structure (~10 μs in two replicates).The structural model of the catalytically competent Cas12a including the DNA TS within the RuvC site was also simulated for ~10 μs.This resulted in a total of ~60 μs of all-atom classical MD simulations.

E
. coli BL21 Rosetta2 (DE3) cells (Novagen, Wisconsin, USA).Cells were lysed in 20 mM Tris pH 8.0, 500 mM NaCl, 5 mM imidazole, 1 µg/mL pepstatin, 200 µg/mL 4-(2-Aminoethyl)benzenesulfonyl fluoride hydrochloride (AEBSF) by ultrasonication.Clarified lysate was applied to a 10 ml Ni-NTA (Sigma-Aldrich) affinity column.The column was washed with 20 mM Tris pH 8.0, 500 mM NaCl, 5 mM imidazole, and bound protein was eluted by increasing imidazole concentration to 250 mM.Eluted protein was dialysed against 20 mM HEPES pH 7.5, 250 mM KCl, 1 mM DTT, 1 mM EDTA overnight at 4 °C in the presence of TEV protease to remove the 6xHis-affinity tag.Cleaved protein was further purified using a HiTrap HP Heparin column (GE Healthcare, Illinois, USA), eluting with a linear gradient to 1.0 M KCl.Elution fractions were pooled, concentrated, and further purified by size exclusion chromatography using a Superdex 200 (16/600) column (GE Healthcare) in 20 mM HEPES-KOH pH 7.5, 500 mM KCl, 1 mM DTT yielding pure, monodisperse proteins.Aliquots were flash-frozen in liquid nitrogen and stored at -80°C.FnCas12a nuclease activity assays .In vitro nuclease activity assays were conducted using purified WT or mutant FnCas12a proteins programmed with a crRNA targeting the λsequence (oMS017, IDT, HPLC purified) and a dsDNA substrate containing fluorescently labeled TS (oDS285:oDS271, Merck, HPLC purified).FnCas12a and crRNA were mixed in a ratio of 1:1.2 and incubated for 10 min at 25 °C to allow binary complex formation.Reactions were started by addition of target dsDNA (FnCas12a:dsDNA, 10:1) and incubated at 37 °C.All samples were assembled in a final reaction volume of 20 µL containing 0.5 µM (mutant) FnCas12a, 0.6 µM crRNA and 50 nM dsDNA in a final buffer of 10 mM HEPES-KOH pH 7.5, 250 mM KCl, 5 mM MgCl2, 0.5 mM DTT.Reactions were stopped at indicated time points by addition of EDTA and Proteinase K (Thermo Fisher Scientific) in final concentrations of 80 mM and 0.8 mg/mL, respectively and incubated for 15 min at 37 °C.Samples were mixed with equal volume of a 2X dPAGE loading dye (95 % formamide, 25 mM EDTA), heated to 95 °C for 5 min and resolved on a 15 % denaturing (7 M Urea) polyacrylamide gels run in 0.5 X TBE buffer.Assays were conducted in technical triplicates; indicated error bars represent the standard error of mean.